Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix model-downloader and tgi in multi shard case #642

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

lianhao
Copy link
Collaborator

@lianhao lianhao commented Dec 12, 2024

Description

Upgrade huggingface-hub to version 0.26.5 when downloading models, due to the existing huggingface/downloader:0.17.3 image doesn't acknowledge the HF_TOKEN correctly.

Loose the tgi securityContext to allow running with multi shard.

Issues

Fixes #641
Fixes #639

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

@lianhao lianhao requested a review from yongfengdu as a code owner December 12, 2024 08:51
@lianhao lianhao requested a review from Ruoyu-y December 12, 2024 08:52
@lianhao lianhao changed the title Use huggingface-hub 0.26.5 to download model Fix model-downloader and tgi in multi shard case Dec 12, 2024
Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

Cache being in emptyDir i.e. going away when the pod instance goes away (instead of being shared like model data), is more secure, but it can be significant pod startup performance issue. Especially with HPA and in other setups were pods come and go. That can be looked in another PR though.

I think in long term it would be better to separate model downloading and running of the application services. That would allow model downloading to be centralized to single service / container, instead of split over multiple pods, and even to do it as separate step before starting any of the application pods.

Copy link
Collaborator

@Ruoyu-y Ruoyu-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lianhao lianhao merged commit a4a96ab into opea-project:main Dec 17, 2024
18 checks passed
@lianhao lianhao deleted the bug641 branch December 17, 2024 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants